DeepMind's reinforcement learning framework: Acme

As I mentioned at the previous post, DeepMind published a reinforcement learning framework Acme.

1. Introduction

Acme provides higher level API, a simple training code can be something like following;

loop = acme.EnvironmentLoop(environment, agent)
loop.run()

Any environments with DeepMind Environment API can be used in this training loop. Section 3 shows the detail.

User can select TensorFlow (only a certain nightly build) or JAX as deep learning framework for agent and some agents are already included in this repository. The detail is described at Section 4.

Even though Acme is described as a framework for distributed reinforcement learning at the technical report, DeepMind does not publish distributed agents and they doesn’t have any timetable to release them, unfortunately. (See FAQ)

2. Installation

Acme can be installed from PyPI with pip. The package name is “dm-acme”. There are some install options. I think that the option “reverb” is almost always necessary for replay buffer, and that one of “tf” and “jax” is neccessary, too.

So that, the installation command is

pip install dm-acme[reverb,tf]

or

pip install dm-acme[reverb,jax]

Additionally, you can use an option “env” to install some environments like “dm-control” and “gym”.

3. Environment

The abstract class is defined as dm_env.Environment at here. One of the largest difference from gym.Env is returning dm_env.TimeStep class instead of simple Python tuple. The TimeStep class also tracks the type of the step is the first (StepType.FIRST), mid (StepType.MID), or the last (StepType.LAST) step in the trajectory.

For gym.Env, Acme prepares wrapper acme.wrappers.GymWrapper.

import acme
import gym

env = acme.wrappers.GymWrapper(gym.make("MountainCarContinuous-v0"))

4. Agent

The list of agents are shown at here. Currently (August 2020), 11 agents are provided.

  • Continuous control
    • Deep Deterministic Policy Gradient (DDPG)
    • Distributed Distributional Deep Determinist (D4PG)
    • Maximum a posteriori Policy Optimisation (MPO)
    • Distributional Maximum a posteriori Policy Optimisation (DMPO)
  • Discrete control
    • Deep Q-Networks (DQN)
    • Importance-Weighted Actor-Learner Architectures (IMPALA)
    • Recurrent Replay Distributed DQN (R2D2)
  • Batch RL
    • Behavior Cloning (BC)
  • Learning from demonstraitions
    • Deep Q-Learning from Demonstrations (DQfD)
    • Recurrent Replay Distributed DQN from Demonstratinos (R2D3)
  • Model-based RL
    • Monte-Carlo Tree Search (MCTS)

These agents extends acme.agents.agent.Agent. You can also create your own custom algorithms by similar way.

Notice: These predefined implementations use DeepMind’s neural network libraries Sonnet for TensorFlow and Haiku for JAX. You might have to know these libraries to define network(s) at agents.

5. Tutorials

Acme provides official quick start notebook. If you have MuJoCo lisence, you can also try tutorial notebook, too.

Avatar
Hiroyuki Yamada

My research interests include machine learning, cloud native conputing.

Related